Linear-Time Computation of Prefix Table for Weighted Strings

نویسندگان

Carl Barton

Solon P. Pissis

چکیده

The prefix table of a string is one of the most fundamental data structures of algorithms on strings: it determines the longest factor at each position of the string that matches a prefix of the string. It can be computed in time linear with respect to the size of the string, and hence it can be used efficiently for locating patterns or for regularity searching in strings. A weighted string is a string in which a set of letters may occur at each position with respective occurrence probabilities. Weighted strings, also known as position weight matrices or uncertain strings, naturally arise in many biological contexts; for example, they provide a method to realise approximation among occurrences of the same DNA segment. In this article, given a weighted string x of length n and a constant cumulative weight threshold 1/z, defined as the minimal probability of occurrence of factors in x, we present an O(n)-time algorithm for computing the prefix table of x. Furthermore, we outline a number of applications of this result for solving various problems on non-standard strings, and present some preliminary experimental results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhanced Covers of Regular and Indeterminate Strings Using Prefix Tables

A cover of a string x = x[1..n] is a proper substring u of x such that x can be constructed from possibly overlapping instances of u. A recent paper [12] relaxes this definition — an enhanced cover u of x is a border of x (that is, a proper prefix that is also a suffix) that covers a maximum number of positions in x (not necessarily all) — and proposes efficient algorithms for the computation o...

متن کامل

Enhanced Covers of Regular & Indeterminate Strings using Prefix Tables

متن کامل

Computing Covers Using Prefix Tables

An indeterminate string x = x[1..n] on an alphabet Σ is a sequence of nonempty subsets of Σ; x is said to be regular if every subset is of size one. A proper substring u of regular x is said to be a cover of x iff for every i ∈ 1..n, an occurrence of u in x includes x[i]. The cover array γ = γ[1..n] of x is an integer array such that γ[i] is the longest cover of x[1..i]. Fifteen years ago a com...

متن کامل

Time Complexity of Knuth-Morris-Pratt String Matching Algorithm

This project centers on the evaluation for the time complexity of Knuth-Morris-Pratt(KMP) string matching algorithm. String matching problem is to locate a pattern string within a larger string. The best performance in terms of asymptotic time complexity is currently linear, given by the KMP algorithm. In this algorithm, firstly a prefix for the pattern string is computed and then based on this...

متن کامل

Improved Filters for the Approximate Suffix-Prefix Overlap Problem

Computing suffix-prefix overlaps for a large collection of strings is a fundamental building block for the analysis of genomic next-generation sequencing data. The approximate suffix-prefix overlap problem is to find all pairs of strings from a given set such that a prefix of one string is similar to a suffix of the other. Välimäki et al. (Information and Computation, 2012) gave a solution to t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Theor. Comput. Sci.

دوره 656 شماره

صفحات -

تاریخ انتشار 2015

Linear-Time Computation of Prefix Table for Weighted Strings

نویسندگان

چکیده

منابع مشابه

Enhanced Covers of Regular and Indeterminate Strings Using Prefix Tables

Enhanced Covers of Regular & Indeterminate Strings using Prefix Tables

Computing Covers Using Prefix Tables

Time Complexity of Knuth-Morris-Pratt String Matching Algorithm

Improved Filters for the Approximate Suffix-Prefix Overlap Problem

عنوان ژورنال:

اشتراک گذاری